12 research outputs found

    Accelerating exhaustive pairwise metagenomic comparisons

    Get PDF
    In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. Parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. These algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show thats equential optimizations yield up to 8x speedup for scenarios with larger data.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    SiL: An Approach for Adjusting Applications to Heterogeneous Systems Under Perturbations

    Full text link
    Scientific applications consist of large and computationally-intensive loops. Dynamic loop scheduling (DLS) techniques are used to load balance the execution of such applications. Load imbalance can be caused by variations in loop iteration execution times due to problem, algorithmic, or systemic characteristics (also, perturbations). The following question motivates this work: "Given an application, a high-performance computing (HPC) system, and both their characteristics and interplay, which DLS technique will achieve improved performance under unpredictable perturbations?" Existing work only considers perturbations caused by variations in the HPC system delivered computational speeds. However, perturbations in available network bandwidth or latency are inevitable on production HPC systems. Simulator in the loop (SiL) is introduced, herein, as a new control-theoretic inspired approach to dynamically select DLS techniques that improve the performance of applications on heterogeneous HPC systems under perturbations. The present work examines the performance of six applications on a heterogeneous system under all above system perturbations. The SiL proof of concept is evaluated using simulation. The performance results confirm the initial hypothesis that no single DLS technique can deliver best performance in all scenarios, while the SiL-based DLS selection delivered improved application performance in most experiments

    On the Parallelization of Sequential Programs

    No full text

    Near Optimal Work-Stealing Tree Scheduler for Highly Irregular Data-Parallel Workloads

    No full text
    We present a work-stealing algorithm for runtime scheduling of data-parallel operations in the context of shared-memory architectures on data sets with highly-irregular workloads that are not known a priori to the scheduler. This scheduler can parallelize loops and operations expressible with a parallel reduce or a parallel scan. The scheduler is based on the work-stealing tree data structure, which allows workers to decide on the work division in a lock-free, workload-driven manner and attempts to minimize the amount of communication between them. A significant effort is given to showing that the algorithm has the least possible amount of overhead. We provide an extensive experimental evaluation, comparing the advantages and shortcomings of different data-parallel schedulers in order to combine their strengths. We show specific workload distribution patterns appearing in practice for which different schedulers yield suboptimal speedup, explaining their drawbacks and demonstrating how the work-stealing tree scheduler overcomes them. We thus justify our design decisions experimentally, but also provide a theoretical background for our claims

    OpenMP Loop Scheduling Revisited: Making a Case for More Schedules

    No full text
    In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by outlining the current state of the art in loop scheduling and presenting evidence that the existing OpenMP schedules are insufficient for all combinations of applications, systems, and their characteristics. A review of the state of the art shows that due to the specifics of the parallel applications, the variety of computing platforms, and the numerous performance degradation factors, no single loop scheduling technique can be a 'one-fits-all' solution to effectively optimize the performance of all parallel applications in all situations. The impact of irregularity in computational workloads and hardware systems, including operating system noise, on the performance of parallel applications, results in performance loss and has often been neglected in loop scheduling research, in particular, the context of OpenMP schedules. Existing dynamic loop self-scheduling techniques, such as trapezoid self-scheduling, factoring, and weighted factoring, offer an unexplored potential to alleviate this degradation in OpenMP due to the fact that they explicitly target the minimization of load imbalance and scheduling overhead. Through theoretical and experimental evaluation, this work shows that these loop self-scheduling methods provide a benefit in the context of OpenMP. In conclusion, OpenMP must include more schedules to offer a broader performance coverage of applications executing on an increasing variety of heterogeneous shared memory computing platforms

    Toward a Standard Interface for User-Defined Scheduling in OpenMP

    No full text
    Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain instances. Given the large number of other possible scheduling strategies, standardizing each of them is infeasible. A more viable approach is to extend the OpenMP standard to allow a user to define loop scheduling strategies within her application. The approach will enable standard-compliant application-specific scheduling. This work analyzes the principal components required by user-defined scheduling and proposes two competing interfaces as candidates for the OpenMP standard. We conceptually compare the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran. These interfaces serve the OpenMP community as a basis for discussion and prototype implementation supporting user-defined scheduling in an OpenMP library

    Associations of apolipoprotein E gene with ischemic stroke and intracranial atherosclerosis.

    No full text
    The apolipoprotein E (APOE) epsilon4 allele is associated with elevated cholesterol and risk of atherosclerosis. However, its role in ischemic stroke (IS) remains controversial. We investigated a possible link between IS or the severity of intracranial atherosclerosis and the APOE promoter polymorphisms -219G/T and +113G/C, involved in regulating APOE transcription. We genotyped subjects from a multicentric Belgian case-control study, including 237 middle-aged patients with IS due to small- or large-vessel atherosclerotic stroke and 326 ethnicity- and gender-matched controls and a Finnish autopsy series of 1004 non-stroke cases, who had received a quantitative score of atherosclerosis in the circle of Willis. The APOE epsilon4+ genotype did not associate with IS, but was related to more severe intracranial atherosclerosis score in men (5.4 vs 4.6, P=0.044). Within the most common APOE epsilon3/epsilon3 genotype group, the risk of IS associated with the G-allele of the tightly linked -219G/T (OR=6.2; 95% CI: 1.6-24.3, P=0.009) and +113G/C (OR=7.1; 95% CI: 1.7-29.9, P=0.007) promoter polymorphisms. There was no difference in the severity of intracranial atherosclerosis between -219G/G genotype carriers and non-carriers. This study suggests a multifaceted role of apoE on the risk of cerebrovascular diseases. The APOE epsilon4+ genotype did not predict the risk of IS but was associated with severity of subclinical intracranial atherosclerosis in men on the autopsy study. In contrast, the promoter variants were significant predictors of IS, suggesting that quantitative rather than qualitative variation of apoE is related to IS.Comparative StudyJournal ArticleResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe
    corecore